ISCAS: A System for Chinese Word Sense Induction Based on K-means Algorithm
نویسندگان
چکیده
This paper presents an unsupervised method for automatic Chinese word sense induction. The algorithm is based on clustering the similar words according to the contexts in which they occur. First, the target word which needs to be disambiguated is represented as the vector of its contexts. Then, reconstruct the matrix constituted by the vectors of target words through singular value decomposition (SVD) method, and use the vectors to cluster the similar words. Our system participants in CLP2010 back off task4-Chinese word sense induction.
منابع مشابه
Chinese Word Sense Induction with Basic Clustering Algorithms
Word Sense Induction (WSI) is an important topic in natural langage processing area. For the bakeoff task Chinese Word Sense Induction (CWSI), this paper proposes two systems using basic clustering algorithms, k-means and agglomerative clustering. Experimental results show that k-means achieves a better performance. Based only on the data provided by the task organizers, the two systems get FSc...
متن کاملLSTC System for Chinese Word Sense Induction
This paper presents the Chinese word sense Induction system of Leshan Teachers’ College. The system participates in the Chinese word sense Induction of task 4 in Back offs organized by the Chinese Information Processing Society of China (CIPS) and SIGHAN. The system extracts neighbor words and their POSs centered in the target words and selected the best one of four cluster algorithms: Simple K...
متن کاملK-means and Graph-based Approaches for Chinese Word Sense Induction Task
This paper details our experiments carried out at Word Sense Induction task. For the foreign language (especially English), there have been many studies of word sense induction (WSI), and the approaches and the techniques are more and more mature. However, the study of Chinese WSI is just getting started, and there has not been a better way to solve the problems encountered. WSI can be divided ...
متن کاملNEUNLPLab Chinese Word Sense Induction System for SIGHAN Bakeoff 2010
This paper describes a character-based Chinese word sense induction (WSI) system for the International Chinese Language Processing Bakeoff 2010. By computing the longest common substrings between any two contexts of the ambiguous word, our system extracts collocations as features and does not depend on any extra tools, such as Chinese word segmenters. We also design a constrained clustering alg...
متن کاملSoochow University: Description and Analysis of the Chinese Word Sense Induction System for CLP2010
Recent studies on word sense induction (WSI) mainly concentrate on European languages, Chinese word sense induction is becoming popular as it presents a new challenge to WSI. In this paper, we propose a feature-based approach using the spectral clustering algorithm to this problem. We also compare various clustering algorithms and similarity metrics. Experimental results show that our system ac...
متن کامل